{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to use the pre-trained model in your own code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Complete the Setup instructions from the Readme and download the pre-trained models\n",
    "- The models were tested with Keras v. 2.1.5 and tensorflow 1.6.0\n",
    "- You may need to install `hd5py` with pip and then re-install numpy==1.13.1 if it gets updated"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import sys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Make sure that the directory of the project is in your Python PATH\n",
    "sys.path.insert(0, \"relation_extraction/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from core.parser import RelParser\n",
    "from core import keras_models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. You need to tokenize and part-of-speech tag your data\n",
    "- The easiest way to so is to use Stanford CoreNLP server with the pycorenlp library\n",
    "- Install and start the CoreNLP server with english models as instructed here: [CoreNLP Server](https://stanfordnlp.github.io/CoreNLP/corenlp-server.html)\n",
    "- Install the pycorenlp python library: `pip install pycorenlp`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from pycorenlp import StanfordCoreNLP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "corenlp = StanfordCoreNLP('http://semanticparsing:9000')\n",
    "corenlp_properties = {\n",
    "    'annotators': 'tokenize, pos, ner',\n",
    "    'outputFormat': 'json'\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def get_tagged_from_server(input_text):\n",
    "    \"\"\"\n",
    "    Send the input_text to the CoreNLP server and retrieve the tokens, named entity tags and part-of-speech tags.\n",
    "    \"\"\"\n",
    "    corenlp_output = corenlp.annotate(input_text,properties=corenlp_properties).get(\"sentences\", [])[0]\n",
    "    tagged = [(t['originalText'], t['ner'], t['pos']) for t in corenlp_output['tokens']]\n",
    "    return tagged"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('Germany', 'LOCATION', 'NNP'), ('is', 'O', 'VBZ'), ('a', 'O', 'DT'), ('country', 'O', 'NN'), ('in', 'O', 'IN'), ('Europe', 'LOCATION', 'NNP')]\n"
     ]
    }
   ],
   "source": [
    "print(get_tagged_from_server(\"Germany is a country in Europe\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('Star', 'O', 'NNP'), ('Wars', 'O', 'NNP'), ('VII', 'O', 'NNP'), ('is', 'O', 'VBZ'), ('an', 'O', 'DT'), ('American', 'MISC', 'JJ'), ('space', 'O', 'NN'), ('opera', 'O', 'NN'), ('epic', 'O', 'NN'), ('film', 'O', 'NN'), ('directed', 'O', 'VBN'), ('by', 'O', 'IN'), ('J.', 'PERSON', 'NNP'), ('J.', 'PERSON', 'NNP'), ('Abrams', 'PERSON', 'NNP'), ('.', 'O', '.')]\n"
     ]
    }
   ],
   "source": [
    "print(get_tagged_from_server(\"Star Wars VII is an American space opera epic film directed by  J. J. Abrams.\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- You can also generate a similar output with any part-of-speech tagger of your choice und use it with our models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Extract entity mentions and generate an empty graph of relations in the input sentence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from core import entity_extraction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Convert the input string into a list of tuples with the Stanford CoreNLP as explained above\n",
    "tagged = get_tagged_from_server(\"Germany is a country in Europe\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'left': [0], 'right': [5]}, {'left': [0], 'right': [3]}, {'left': [5], 'right': [3]}]\n"
     ]
    }
   ],
   "source": [
    "entity_fragments = entity_extraction.extract_entities(tagged)\n",
    "edges = entity_extraction.generate_edges(entity_fragments)\n",
    "non_parsed_graph = {'tokens': [t for t, _, _ in tagged],\n",
    "                    'edgeSet': edges}\n",
    "print(edges)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Empty relations are called edges and they have two attributes: 'left' and 'right' that contain token indices of entity mentions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Load the pre-trained relation extraction model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# the glove embeddings should be in the \"resources/\" folder, otherwise change the pathes in the model_params.json or directly in the code\n",
    "keras_models.model_params['wordembeddings'] = \"../resources/embeddings/glove/glove.6B.50d.txt\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded embeddings: (400002, 50)\n",
      "Parameters: {'rnn1_layers': 1, 'bidirectional': False, 'units1': 256, 'dropout1': 0.5, 'optimizer': 'adam', 'window_size': 3, 'position_emb': 3, 'batch_size': 128, 'gpu': True, 'property2idx': 'property2idx.txt', 'wordembeddings': '/home/local/UKP/sorokin/IdeaProjects/resources/embeddings/glove/glove.6B.50d.txt', 'max_sent_len': 36}\n"
     ]
    }
   ],
   "source": [
    "# The downloaded pretrained models should be in the \"trainedmodels/\" folder\n",
    "relparser = RelParser(\"model_ContextWeighted\", models_folder=\"trainedmodels/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Label the edges in the sentence graph using the pre-trained model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|##########| 1/1 [00:00<00:00, 7219.11it/s]\n"
     ]
    }
   ],
   "source": [
    "parsed_graph = relparser.classify_graph_relations([non_parsed_graph])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- The output is a dictionary, the labeled edges are stored in the 'edgeSet' field. \n",
    "- 'kbID' contains the wikidata identifier of the assigned relation, 'P0' stands for an empty relation\n",
    "- 'lexicalInput' contains a human readable relation label"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'tokens': ['Germany', 'is', 'a', 'country', 'in', 'Europe'], 'edgeSet': [{'left': [0], 'right': [5], 'kbID': 'P30', 'lexicalInput': 'continent'}, {'left': [0], 'right': [3], 'kbID': 'P0', 'lexicalInput': 'ALL_ZERO'}, {'left': [5], 'right': [3], 'kbID': 'P31', 'lexicalInput': 'instance of'}]}\n"
     ]
    }
   ],
   "source": [
    "print(parsed_graph[0])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:sem-env]",
   "language": "python",
   "name": "conda-env-sem-env-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
